AITopics | gradient algorithm

Collaborating Authors

gradient algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates

Neural Information Processing SystemsMar-16-2026, 19:54:07 GMT

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step-size. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback

Total stochastic gradient algorithms and applications in reinforcement learning

Neural Information Processing SystemsMar-16-2026, 17:59:04 GMT

Backpropagation and the chain rule of derivatives have been prominent; however, the total derivative rule has not enjoyed the same amount of attention. In this work we show how the total derivative rule leads to an intuitive visual framework for creating gradient estimators on graphical models. In particular, previous "policy gradient theorems" are easily derived. We derive new gradient estimators based on density estimation, as well as a likelihood ratio gradient, which "jumps" to an intermediate node, not directly to the objective function. We evaluate our methods on model-based policy gradient algorithms, achieve good performance, and present evidence towards demystifying the success of the popular PILCO algorithm.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

5a44a53b7d26bb1e54c05222f186dcfb-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 07:15:59 GMT

assumption, projection, reproducibility, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

Total stochastic gradient algorithms and applications in reinforcement learning

Paavo Parmas

Neural Information Processing SystemsFeb-12-2026, 05:41:05 GMT

Neural Information Processing Systems http://nips.cc/

estimator, gradient, gradient estimator, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)

Add feedback

AcceleratedProjectedGradientAlgorithmsfor SparsityConstrainedOptimizationProblems

Neural Information Processing SystemsFeb-11-2026, 06:56:06 GMT

A classical problem that fits in the framework of (1) is the best subset selection problem in linear regression [6,20].

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Taiwan > Taiwan Province > Taipei (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

TowardsTheoreticallyUnderstandingWhySGD GeneralizesBetterThanADAMinDeepLearning

Neural Information Processing SystemsFeb-11-2026, 02:37:17 GMT

Differently,SGD usually improves model performance slowly but could achievehigher testperformance.

artificial intelligence, inproc, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Accelerated Projected Gradient Algorithms for Sparsity Constrained Optimization Problems

Neural Information Processing SystemsDec-24-2025, 23:14:10 GMT

We consider the projected gradient algorithm for the nonconvex best subset selection problem that minimizes a given empirical loss function under an $\ell_0$-norm constraint. Through decomposing the feasible set of the given sparsity constraint as a finite union of linear subspaces, we present two acceleration schemes with global convergence guarantees, one by same-space extrapolation and the other by subspace identification. The former fully utilizes the problem structure to greatly accelerate the optimization speed with only negligible additional cost. The latter leads to a two-stage meta-algorithm that first uses classical projected gradient iterations to identify the correct subspace containing an optimal solution, and then switches to a highly-efficient smooth optimization method in the identified subspace to attain superlinear convergence. Experiments demonstrate that the proposed accelerated algorithms are magnitudes faster than their non-accelerated counterparts as well as the state of the art.

accelerated, gradient algorithm, sparsity constrained optimization problem, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Debiasing Averaged Stochastic Gradient Descent to handle missing values

Neural Information Processing SystemsDec-24-2025, 08:05:59 GMT

Stochastic gradient algorithm is a key ingredient of many machine learning methods, particularly appropriate for large-scale learning. However, a major caveat of large data is their incompleteness. We propose an averaged stochastic gradient algorithm handling missing values in linear models. This approach has the merit to be free from the need of any data distribution modeling and to account for heterogeneous missing proportion. In both streaming and finite-sample settings, we prove that this algorithm achieves convergence rate of $\mathcal{O}(\frac{1}{n})$ at the iteration $n$, the same as without missing values. We show the convergence behavior and the relevance of the algorithm not only on synthetic data but also on real data sets, including those collected from medical register.

debiasing averaged stochastic gradient descent, electronic proceedings, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback